Using Mutual Information to Resolve Query Translation Ambiguities and Query Term Weighting
نویسندگان
چکیده
An easy way of translating queries in one language to the other for cross-language information retrieval (IR) is to use a simple bilingual dictionary. Because of the generalpurpose nature of such dictionaries, however, this simple method yields a severe translation ambiguity problem. This paper describes the degree to which this problem arises in Korean-English cross-language IR and suggests a relatively simple yet effective method for disambiguation using mutual information statistics obtained only from the target document collection. In this method, mutual information is used not only to select the best candidate but also to assign a weight to query terms in the target language. Our experimental results based on the TREC-6 collection shows that this method can achieve up to 85% of the monolingual retrieval case and 96% of the manual disambiguation case.
منابع مشابه
Analysis of User query refinement behavior based on semantic features: user log analysis of Ganj database (IranDoc)
Background and Aim: Information systems cannot be well designed or developed without a clear understanding of needs of users, manner of their information seeking and evaluating. This research has been designed to analyze the Ganj (Iranian research institute of science and technology database) users’ query refinement behaviors via log analysis. Methods: The method of this research is log anal...
متن کاملTerm Selection Term Selection Query - language Term Translation Doc - language Term Selection Term Weighting Term Matching Term Weighting Term Matching
This paper presents results for the Japanese/English cross-language information retrieval task on the NACSIS Test Collection. Two automatic dictionary-based query translation techniques were tried with four variants of the queries. The results indicate that longer queries outperform the required description-only queries and that use of the rst translation in the edict dictionary is comparable w...
متن کاملImproving query translation in English-Korean cross-language information retrieval
Query translation is a viable method for cross-language information retrieval (CLIR), but it suffers from translation ambiguities caused by multiple translations of individual query terms. Previous research has employed various methods for disambiguation, including the method of selecting an individual target query term from multiple candidates by comparing their statistical associations with t...
متن کاملImproving Query Translation for Cross-Language Information Retrieval using a Web-based Approach
With the increasing popularity of the Internet, research on Cross-Language Information Retrieval (CLIR) is being paid much attention. Existing improving approaches for query translation such as noun phrase (NP) identification, translation and words translation selection require special corpus resource. However, those natural language resources are not readily available. In this paper, we propos...
متن کاملQEA: A New Systematic and Comprehensive Classification of Query Expansion Approaches
A major problem in information retrieval is the difficulty to define the information needs of user and on the other hand, when user offers your query there is a vast amount of information to retrieval. Different methods , therefore, have been suggested for query expansion which concerned with reconfiguring of query by increasing efficiency and improving the criterion accuracy in the information...
متن کامل